perm filename VV.DOC[VV,BGB] blob
sn#135775 filedate 1974-12-17 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00018 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 TITLE PAGE
C00005 00003 1. INTRODUCTION
C00012 00004 2. Verification Vision System Design.
C00015 00005 Proposed task continued
C00017 00006 TRAINING ... step thru the procedure given above ... interactively
C00025 00007 (two hidden line drawing with occlusion).
C00027 00008 (overlay tolerance box on picture)
C00030 00009 (wide angle stereo picture)
C00031 00010 (diagram showing x,y,z change)
C00033 00011 2. Verification Vision System Design
C00036 00012 3.1 Prediction by simulation - BGB
C00037 00013 3.2 Prediction by training - RCB
C00039 00014 4. Comparison
C00041 00015 5. Correction.
C00042 00016 6. Application of Verification Vision
C00046 00017 7. Conclusion
C00047 00018 8. References.
C00048 ENDMK
C⊗;
TITLE PAGE
VERIFICATION VISION
Bruce G. Baumgart
Robert C. Bolles
Abtract:
Main points (for abstract):
A system organization for VV ... steps, models and tolerances
Automatic prediction of `features' ... including curves
Training ... including the location and description of curves
Location and comparison ... simple `fixed' strategy ... use 1 to find 2nd
Correction ... mathematics for transforms, 2-D ↔ 3-D etc.
Contents:
1. Introduction.
2. Verification Vision System Design.
2.1 VV Mandala
2.2 Visual Representations
2.3 Other Rresearch
3. Prediction.
3.1 Prediction by simulation - BGB
3.2 Prediction by training - RCB
4. Comparison - RCB
Comparison by correlation
Comparison by feature elements.
5. Correction - BGB
Correcting the camera model.
Correcting the world model.
6. Application of Verification Vision.
7. Conclusions.
8. References.
1. INTRODUCTION
Introduction
Definition ... not yes/no ... not recognition
... predict ... compare ... correct
covers (1) hypoth and test (2) stereo (3) pred environment ...
example ... screw in hole (monocular)
=> (1) predictable w/i tolerances ... impt. tolerances
(2) many types of info ...
in the past there have been some "special purpose" vision hacks (pump)
want to "systemetize and automate" as much as possible
... many applications: automation, cart, ... (pump paper for back)
... also we believe there exists a sufficiently
interesting set of low-level operators
Our approach to paper ... "theory and grand acheme" plus history section 2
... sections on what we have done ... trying to keep speculation
to a min ... and then in the conclusion come back to the grand
scheme and bring things together ... and explore the next
extensions and their potential (& difficulty)
Verification vision is the process of synchronizing visual
prediction with visual perception. Such verification may be
performed on several levels of abstraction: 2-D images, 3-D models as
well as on semantic descriptions. However in our recent work
on vision for a robot factory worker, predicted images
can be obtained which are nearly identical to the perceived images.
In such a case, the verification is done for the sake of measuring
small geometric differences which are expected but which can not be
rapidly measured by other means. That is the identities of the image
elements are not in question, only their precise relative positions.
Verification vision also includes "hypothesis and test" where
a predicted line at a certain span of
location, orientation and contrast is compared
with a line from a perceived image
as in [FALK] and [SHIRAI].
Finally, verification vision includes narrow angle
correlation stereo. In this case the "prediction" is another image of the
same objects, but taken from a slightly different relative position.
The goal is to locate matching "features" (such as correlation patches)
in order to provide the stereo package with two positions for the same
part of the scene (see [Thomas]). Notice, as mentioned above, that the
identities of the models (ie. the line and correlation patch) are not in
question; only their positions in the actual image.
(programmable assembly system).
Such systems provide
complex, but predictable environments consisting of objects with curved,
textured surfaces. There have been a few special-purpose programs which
perform verification vision tasks within such environments (eg. see
[BOLLES], [ROSSOL], ...), but there have been no generalized systems
which predict and locate curved objects. Garvey and Agin at SRI have
each set up systems which deal with real objects, but only periferally
concerned with shapes.
In this paper we present a design <BE CONCISE !!
"organizational structure" is too many letters>
for verification vision and describe a
system which has carried out the task of visually locating a bolt hole
in a brake assembly and visually servoing a bolt into the hole.
The brake assembly's
initial location was known to within plus or minus 10mm for
both X and Y, and plus or minus 10 degrees rotation about is center.
The location of the hole involved predicting and locating curves. The
servoing is done in a stop-and-go fashion. That is, the arm is moved and
stopped, a pair of stereo pictures were taken, a relative arm correction
was computed, and the arm is moved again.
The next section describes our theory of computer vision
and shows how verification vision fits into this theory.
That section also characaterizes some of other the previous vision
research. The following sections use the task mentioned above to guide
the description of the current implementation of our verification vision
system.
2. Verification Vision System Design.
2.1 VV Mandala
2.2 Visual Representations
2.3 Other Rresearch
"TASK"
STEPS TO BE TAKEN FOR EACH SUBASSEMBLY (during a production run)
(Actual and Synthetic pictures of the task)
use prediction to have the arm pick
locate the hole up a BOLT
with CAMERA 1 |
|
↓ |
compute an estimate
for the 3-D change |
from the prediction (& SUPPORT)
| |
↓
use CAMERA 1's estimate |
to locate the hole
with CAMERA 2 |
|
↓ |
compute the 3-D
change from predicted |
to actual (using stereo
to compute 3-D location) /
\ /
use 3-D change to
correct the destination
of the bolt
|
↓
have the arm move the
bolt to this position
| ← _________________________________
↓ ↑
use best hole position to |
locate the bolt with
CAMERA 1 |
|
↓ |
use arm's Z to compute
an estimate for the |
(same here) 3-D position of the bolt
| |
↓
use this estimate to |
locate the bolt with
CAMERA 2 |
|
↓ |
use stereo to compute
3-D location and correction |
for the next arm move
| |
↓
move the arm appropriately |
if force sensors indicate that
the bolt hit the side of the hole, |
stop ... if vision indicates that
the screw is in the hole |
(by 3-D position, occlusion,...),
stop | |
| |
↓____________________________________↑
Proposed task continued
(Actual picture of brake subassembly and bolt)
(Predicted picture of brake subassembly and bolt)
MODELLING, TRAINING, CALIBRATION, AND PROGRAMMING THE STRATEGY
GEOMED models of the brake subassembly and bolt
including curves (possibly just circles projecting
into 2-D ellipses ... want to be able to find out
the 3-D points that correspond to the points in the
2-D projection ... this would mean that one could
find out what points belong to a curve and simply
fit a curve thru them ...) ... these models also
should have photometric info (to generate synthetic
pictures and roughly estimate the contrast across
edges etc.) ... seems possible to automatically generate
the WALTZ labelling for all of the lines and curves if
they aren't too complicated ... this would be a help
for the "characterization" stage in "training"
**** NEED CURVE EXTENSION TO GEOMED ... IS THERE CURRENTLY
PROVISION FOR PHOTOMETRIC INFOR??? HOW ABOUT THE
2-D REFERENCE BACK TO THE CORRESPONDING 3-D POINT???
TO DO THE WALTZ LABELLING NEED TO KNOW WHY A LINE
APPEARS IN A 2-D DRAWING (CONCAVE EDGE, CONVEX,
ONE SURFACE OCCLUDING ANOTHER, A SHADOW, CRACK, ETC.)
IS THAT TYPE OF INFO RETRIEVABLE??? ****
TRAINING ... step thru the procedure given above ... interactively
keeping the internal model of what is going on in
synchonization with the actual situation (as viewed by
the cameras and monitored by the arm)
TRAINING essentially consists of using the models to
predict what will be seen, taking pictures to get what
is actually seen, and updating (extending) the model
so that it makes better predictions ...
TRAINING (potentially) produces a number of things:
actual pictures of an example assembly
**** FOR VIDEO COMPARE (CORRELATION) ****
final calibration of the two cameras with respect
to each other and the work station
(compare syn w/ act.) final photometric calibration (light levels etc.)
**** ACTUAL USE MAY BE LIMITED TO
RANGE OF CONTRAST, ETC. ****
characterizations of the features, eg. the contrast
across an edge, the confidence of finding the
best correlation for a certain patch, etc.
**** FOR TOPOLOGICAL COMPARE ****
(diagram showing estimates as to how accurate the implications
implied position of are which reduce the tolerances between where
curve and reduction a feature is expected and where it might be
of tolerances) (eg. how beneficial is it to have an edge
point on curve 6 ... what reduction in
tolerances can be made) ... also should point
out any possible confusing edges, correlations,
etc.
**** TOLERANCES ARE IMPORTANT FOR DETERMINING
WHICH TECHNIQUES TO USE ... FOR OBJECT POSITION,
CAMERA POSITION, LIGHT LEVELS, POSSIBLE
OCCLUSIONS, ETC. ... THE SYSTEM WILL PROBABLY
USE ONLY RECTANGLES (IN 2-D) TO REPRESENT
THE TOTAL ALLOWABLE FLUCTUATION ... TAYLOR
HAS SOME FANCIER THINGS WHICH MAY BE USEFUL
... OR AT LEAST POSSIBLY PRETTY ENOUGH TO SHOW
IN A DIAGRAM OR TWO ...****
LOCATE THE HOLE WITH CAMERA 1
Position the subassembly at (X0,Y0) and aim camera 1
as desired ... to a "known" position
(pic showing Use these positions to produce the expected view (using
overlay of pred. hidden line elimination, curves, etc. to first produce
on actual) a line drawing ... then a synthetic picture ... and
finally as much of the Waltz-like information as possible)
Use this expected view (mosaic +) to automatically
locate the desired features (possibly altering the expected
curves or the portion in the 3-D model which projects into
that curve ???) and extract the characterization of the
(maybe diag features ... probably will have to be interactive as
showing opposed to completely automatic ... however, since training
adjustment) is only done once, it seems ok if more time is required to
to do large searches to find the features ... hopefully
the information gained will reduce the amount of this
searching at run-time.
**** ADJUSTING COMES IN TWO FORMS, AT LEAST, (1) MOVING
AROUND IN THE 2-D PICTURE TO FIND THE APPROPRIATE MATCHING
POINT AND (2) MODIFYING THE RELATIVE TRANSFORM BETWEEN
THE CAMERA AND THE SUBASSEMBLY ... ESSENTIALLY THE IRON-
TRIANGLE WORK ****
(diagram with possible features ranked by cost/benefit).
At this point the system could roughly rank the features
according to (1) how easily they can be found (eg. large
and with contrast) and (2) how beneficial it would be to
find it (eg. what reduction in tolerances might be
expected)
**** CURVES COULD BE RANKED BY LENGTH AND CONTRAST
PLUS THEIR CURVATURE ... THE MORE CURVATURE, THE BETTER
IMPLICATIONS ONE CAN MAKE ABOUT WHERE YOU ARE ON THE CURVE,
CORRELATIONS BY SIZE AND THE DISTINCTIVENESS OF THEIR
AUTOCORRELATIONS (OR WHATEVER) ... REALLY ONLY USED
TO GIVE THE PROGRAMMER HELPFUL HINTS AS TO THE GOODNESS
OF THE VARIOUS FEATURES ... AND AS DEMO OF A STEP TO
COME IN AUTOMATIC STRATEGY PRODUCTION ****
(diagram from thesis)
Having located the various features (a number of which
will be correlation points) the `iron triangle' method
can be used to determine the transform between camera 1
and the subassembly ... possibly a version of this
`calibration' could be set up which takes more than
three matching points ... ie. overdetermined system
**** WHAT IS THE STATE OF THE "IRON TRIANGLE" METHOD?
IS THERE ANY REASON TO TRY A FIT-WHEN-OVERDETERMINED
VERSION OF IT??? ANY IDEA HOW ACCURATE IT IS??? ****
The same training steps can be taken for camera 2 ...
(two hidden line drawing with occlusion).
So far, only one position of the subassembly has been
considered. In order to write a program to locate the
hole anywhere within the allowable tolerances (on X, Y,
and the rotation about the center Z vector), the system
should "look at" the various possibilities and make sure
that a sufficient number of the features will be visible
etc. We currently assume that the none of the features
change significantly ... ie. the shadows don't change
to interfere with the visual location of features, features
are not obscured by other parts of the subassembly, etc.
If such things were possible, the model for the expected
scene could include explicit alternatives for the
distinctly different appearances of the object.
(overlay tolerance box on picture)
Eventually it would be desireable to have the system
capable of automatically generating a strategy for locating
the hole (or whatever is desired). This would be done
by simulating the various positions within the tolerances
and deciding which features can be used to answer
which questions about the objects location. So far,
the various visual location programs have been interactively
set up to include a fixed sequence of checks. Depending
upon the initial tolerances, various techniques are used
(eg. the hole location might use a couple of curve location
steps because the total displacement may be large ... the
bolt location may only use correlation because the
tolerances at that point are (hopefully) very small).
Our system should at least be able to display the possible
positions (in a picture) for any point of the object.
This is crucial for deciding upon the strategy.
**** SEEMS TO BE ESSENTIALLY PUTTING A BOX AROUND THE
2-D PROJECTIONS OF THE EXTREME POSITIONS ALLOWED WITHIN
THE TOLERANCES ... EXTREME POSITIONS MAY NOT BE COMPLETELY
HONEST AND RECTANGLES ARE CERTAINLY NOT GENERAL ENOUGH
TO TAKE ADVANTAGE OF ALL OF THE INFORMATION, BUT I THINK
THE IDEA IS CLEAR AND USEFUL ****
(wide angle stereo picture)
To recap: features will be located in the two pictures,
matched, and their 3-D position computed. These 3-D
positions will be used to compute the transform from
the planned position to the actual position.
**** DO YOU HAVE ROUTINES TO COMPUTE THE 3-D LOCATION
GIVEN TWO POSITIONS WITHIN 2-D PICTURES ... IE. TO FIND
THE TWO RAYS IN SPACE AND `INTERSECT' THEM OR AT LEAST
FIND THE POINT OF CLOSEST APPROACH ... A LA SOBEL??? OR
OTHERS??? ****
(diagram showing x,y,z change)
LOCATE THE BOLT
The same process can used to set up the program for
locating the bolt. Remember that there are two distinct
steps possible (1) locate the bolt while it is poised
over the hole (the vision is not as time critical since
the bolt is not moving) and (2) track and servo the
bolt in the hole ... very time critical ... our system
might attempt this ??? ... or do things stop and go???
Stereo is important at this stage because there isn't
the support hypothesis to determine the actual 3-D
positions from 2-D picture location (as there was
for locating the hole). There is, of course, the arm's
measurement of Z, but 1→6mm off in Z makes quite a change
in X and Y because of the angle of the cameras ...
**** DO YOU REALLY INTEND TO DYNAMICALLY SERVO THE
BLOODY ARM ??? IT CERTAINLY SEEMS FEASIBLE IF THE
LOCATION OF THE BOLT CAN BE DONE BY A FEW CORRELATIONS
OR SOME SUCH THING ... THERE ARE REAL DYNAMIC PROBLEMS
THOUGH ... EG. HOW TO GIVE DELTA CHANGES TO THE ARM
ESPECIALLY SINCE ANY CORRECTION WILL HAVE TO INCLUDE
A PREDICTION OF WHERE THE ARM PROGRESSED TO WHILE THE
MACHINE WAS TRYING TO FIND THE BOLT IN THE PICTURE
... ****
2. Verification Vision System Design
VV Overall SYSTEM Organization
(vision in general ... some related work...)
2.1 "Vision Mandala"
Notice this is indepedent of control structure ... top-down and bottom-up,
VV is by definition, top-down
DV (descriptive vision) is by definition, bottom-up
... roughly characteristics which determine top-down vs. bottom-up
Elements of vision representation
2-D image rep:
"raw data": video, depth
"raw feature pictures": edge, contour, ...
"interpreted features": lines, corners, curves,
3-D image rep:
geometric, space, ... good grief
surface photometry ...
physics (support)
special task rep:
point out how others fit into this scheme ... Roberts, Falk, Waltz, Krakaur
ROBERTS ... parameterized models ... pic, edge, lines & polygons,
topol match to model, pick "best" transform from model to
data, uses support to determine final 3-D position
regions GARVEY ... & SRI PROGRESS REPORT ...
YAKIMOVSKY, AND LIEBERMAN
correlation QUAM ... MARSHA JO
blocks GUZMAN FALK WALTZ GRAPE GILL PERKINS PERKINS
contours KRAKAUR BAUMGART
hidden line ... WATKINS ...
graphics ... GOURAUD, (latest Utah)
Fit VV in ... point out levels possible ... give some tradeoffs and reasons
for dealing at each level ...
Fit in a "grand scheme" and then show "actual scheme ... in pieces"
Task accomplished ... in pieces
purpose ... demonstrate ... relationship to the "grand scheme" Need
to describe Stanford's system to put the various existing pieces of
the system in perspective (so to speak) diagram of steps
3.1 Prediction by simulation - BGB
Prediction
Goal: predict view (eventually whole movie) ... maybe just beginning for
interactive system
model ... 3-D geometric + photometry (GEOMED)
hidden line elim => mosaic with photometry, links to 3-D, and "descriptions"
"descriptions" like Waltz ...
example: circle with obscuring plane in front of it ... approx by lines,
show "labelling" and info given to characterizer ... with why and
how ...
3.2 Prediction by training - RCB
Training
Goal: "second calibration step" of the models (geometric, photometric,...)
... the first step is the initial model ... a third step might
be the "calibration" from one picture of a sequence to the next
(eg. following the bolt into the hole ... slightly different
for each assembly)
logically is another VV problem, but one-shot so less time-dependent
ie. it uses prediction, comparison, and correction
the corrections are different ... updating camera vs. object pos
Another distinction: almost necessarily interactive to
insure the correct points (features) are matched up ...
"under a teacher's eye"
... described here, because this is its position within a task
Benefits of training:
4. Comparison
Goal of compare: match points of model with points in picture (or features
more generally)
Currently sort of "fixed" strategy ... use big curves until narrowed down
tolerances well enough to use expensive correlations
model used ... and dynamically changes as comparison progresses
Manual override if confusing curves possible, etc. ... not very good ..alt
with curves ... cost/benefit idea
costs used ... cost of edge op, correl, #expected, etc. benefit?
Conservative ... works like ... step thru example
Model for curves is 2-D (in image)... for correl is too, but wrt to table
future automatic strategies (spec)
5. Correction.
Correcting the camera model.
Correcting the world model.
Correction
Goal of correction: determine a improved estimate for an objects position
could be relative to some other object (as in our case: bolt tip
wrt to the hole) or "sort of absolute" (ie. wrt workstation coords)
model ... stereo ... relative change for arm
6. Application of Verification Vision
DISCLAIMER:
The task was designed to point out the various types of knowledge
available and to demonstrate a system design which is sufficiently
general to take advantage of such knowledge. In particular, this
task was not designed to be the "best" way of accomplishing the task,
but rather as A way with an available hardware configuration. For
example, narrow angle stereo is probably more generally useful for
this type of verification, because fewer features change from one
view to the next.
GOAL: INSERT THE BOLT(S) INTO THE BRAKE SUBASSEMBLY
(actual picture of setup)
"SITUATION" (assumptions listed: general→specific)
programmable assembly environment ... means that there are
cameras, arms, vises, lights, etc. under computer
control ... which in turn means that the environment
is predictable (eg. the lighting)
one arm ... well calibrated, absolute (within 6.0mm) and repeatable
(within 1.5mm)
two cameras ... well calibrated aspect ratio, AR, (within ****)
and focal ratio, FR, (within ****) ... plus roughly
calibrated work station → camera transform (within ****)
lighting ... located at position(s) ... and fixed
bolt dispensor ... in a fixed location and able to dispense bolts
within tolerances 1mm x 1mm x .1mm .......... which means
that the arm (using the repeatability tolerance, can
pick up a bolt within 2.5 mm etc.
**** CAN FAKE IT ... JUST PUT THE BOLT IN THE HAND ****
brake subassembly ... upright, positioned at (X0,Y0) (satisfying
the constraints: -10mm ≤ (actual - X0) ≤ +10mm and
-10mm ≤ (actual Y - Y0) ≤ +10mm ... and the rotation about
its center is +- 10 degrees) ... these are realistic
tolerances resulting from a UNIMATE placing the subassembly
at the desired position at the workstation
Automatic assembly
Cart
one way of looking at this is that a cart with a map of the
road, plus possibly contours, has to do more "revelation"
vision, but as it progresses, it can do verification
vision ... training could be a previous trip along the same
road ... in some sense the relative motion problems are
different (screwdriver ... camera stays still, screwdriver
moves ... with the cart ... the world stays still (more
or less) and the camera moves ... )
a smart cart (everyone ought to have one) should also do recognition
visions ... for cars, cross streets, ...
7. Conclusion
future, future, future, ... I see & I see => I am ⊃ I am (sort of)
Future:
Fancier features & tolerances ... eg. auto correl pred from 3-D, Waltz...
Fancier automatic location of features ... confidence level, 3-D compare mod
Fancier automatic strategy development ...2-D, 3-D, tolerance simul
... modelling relative motion
Fancier math for arbitrary axis of rotation ...
8. References.
Baumgart
Bolles
Falk
Roberts